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Abstract. In this paper we study the exponential decay of posterior probability 
of a set of sources and conditioning by rare sources for both uniform and general 
prior distributions of sources. The decay rate is determined by L-divergence and 
rare sources from a convex, closed set asymptotically conditionally concentrate on 
an L-projection. L-projection on a linear family of sources belongs to /1-family of 
distributions. The results parallel those of Large Deviations for Empirical Measures 
(Sanov's Theorem and Conditional Limit Theorem). 



1. Introduction 

Information divergence minimization, which is also known as Relative Entropy 
Maximization or MaxEnt method, has - thanks to Large Deviations Theorems for 
Empirical Measures - gained a firm probabilistic footing, which justifies its applica- 
tion in the area of the convex Boltzmann Jaynes Inverse Problem (the a-problem, 
for short). For the /3-problem - an 'antipode' of the a-problem - Large Devia- 
tions Theorems for Sources, which are presented here, single out the L-divergence 
maximization method. 

The paper is organized as follows: First, necessary terminology and notation are 
introduced. A brief survey of Large Deviations Theorems for Empirical Measures 
that includes Sanov's Theorem and a Conditional Limit Theorem is given next. 
Then, a set-up for a study of conditioning by rare sources is formulated and Sanov's 
Theorem and the Conditional Limit Theorem for Sources are stated; under various 
assumptions. Next, Theorems are proven for the continuous case and the results 
are applied to a criterion choice problem associated with the /?-problem. An End- 
Notes section points to relevant literature, mentions a motivation for the present 
work and contains further discussion. 
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2. Terminology and notation 

Let V(X) be a set of all probability mass functions on a finite alphabet X = 
{x\,xi, . . . ,x m } of m letters. The support of p G V(X) is a set S(p) = {x : p(x) > 
0}. 

A probability mass function (pmf) from V(X) is rational if it belongs to the set 
K = V(X) nQ m . A rational pmf is n-rational, if denominators of all its m elements 
are n. The set of all n-rational pmf's will be denoted by TZ n - 

Let x\ , X2 , ■ ■ ■ , x n be a sequence of n letters, that is identically and independently 
drawn from a source q G V(X). Type and n-type are other names for empirical mea- 
sures induced by a sequence of the length n. Formally, type v n = [n\, ri2, . . . , n m ]/n, 
where rii is the number of occurrences of z-th letter of the alphabet in the sequence. 
Note that there are T{v n ) = n!(L]™ 1 n^!) -1 different sequences of length n, which 
induce the same type v n . T(v n ) is called the multiplicity of type. Finally, observe 
that v n is n-rational; u n G TZ n - 

Let IT, Q C V(X). n„=nn Tl n and Q n = Q n K n . The former will be called 
set of n-types v n , the latter set of n-sources q n . 

The information divergence (±-relative entropy, Kullback-Leibler distance etc.) 
I{p\\q) of p with respect to q (both from V(X)) is I(p\\q) — J2xPl°Eg, with 
conventions that OlogO = 0, log 6/0 = +oo. The information projection p of q on 
IT is p = arginfpgn I(p\\q)- The value of the /-divergence at an /-projection of q 
on II is denoted by J(n||g). 

On ViX) topology induced by the standard topology on R m is assumed. 

The support S(C) of a convex set C C V(X) is just the support of the member 
of C for which £(•) contains the support of any other member of the set. 

The following families of distributions will be needed: 

1) Linear family a) = {p : J2x P( x ) u ji x ) — a j> 3 — 1j 2, • • ■ , k}, where uj is 
a real-valued function on X and dj G M. 

2) Exponential family £{p, u, 9) = {p : p(x) = zp(x) exp(J^* : =1 0jUj(x)), x G X}, 
where a normalizing factor z = J2 X p(x) cxp(^* : =1 0jUj(x)) and p belongs to V(X); 
6j G M. 

3) yl-family A(p,u,9,a) = {p : p(x) = p(x)[l - J2j=i 6k{uj{x) - Oj)] -1 ,^ S X}. 
The definitions of the families can be extended to continuous A" in a straightfor- 
ward way. 

In what follows, r G ^{X) will be the 'true' source of sequences and hence types. 

3. Conditioning by rare types 

It is convenient to begin with a brief survey of the Large Deviations Theorems 
for Empirical Measures (Sanov's Theorem and a Conditional Limit Theorem). 

First, it is necessary to introduce the probability 7r(V";r) that the source r 
generates an n-type v n . The probability that r generates a sequence of n letters 
x\, X2, ■ ■ ■ , x n which induces a type v n is YliLii.'f'i)"'^ ■ As it was already men- 
tioned, there is a number Y(y n ) of sequences of length n, which induce the same 
type v n . The probability Tr(v n ;r) that r generates type v n is thus -n{v n ;r) = 
T(is n )l\™ =1 (ri) nu ? . Consequently, for A C B C V{X), ir(v n G A\v n G B;r) = 
%l A B-ry Provided that 7r(i/ n G B; r) 0. 
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II is rare if it does not contain r. Given that the source r produced an n-type 
from rare II, it is of interest to know how the conditional probability/measure 
spreads among the rare n- types from II; especially as n grows beyond any limit. 
For the rare set of a particular form, this issue is answered by Conditional Limit 
Theorem (CoLT) which is also known as Conditional Weak Law of Large Numbers. 

CoLT can be established by means of Sanov's Theorem (ST). 

ST. ([6] Thm 3) Let II be a set such that its closure is equal to the closure of its 
interior. Let r be such that S(r) = X. Then, 

lim -logTrfi/ 1 G II; r) = -7(n||r). 

n^oo fl 

Sanov's Theorem (ST) states that the probability 7r(^™ G II; r) decays exponen- 
tially fast, with the decay rate given by the value of the information divergence at 
an /-projection of the source r on II. 

CoLT. ([8] Thm 4.1, [2] Thm 12.6.2) Let II be a convex, closed rare set. LetB(p,e) 
be a closed e-ball defined by the total variation metric, centered at 1 -projection p of 
r on II. Then for any e > 0, 

lim 7r(i/ n G B(p, e) | v n G II; r) = 1. 

n^oo 

Informally, CoLT states that if a dense rare set admits a unique /-projection, 
then asymptotically types conditionally concentrate just on it. Thus, provided that 
for sufficiently large n a type from rare II occurred, with probability close to 1 it is 
just a type close to p. Numeric examples of ST and CoLT can be found at [2]. 

This suggests that, conditionally upon the rare II, it is the /-projection p rather 
than r, which should be considered as the true iid source of data. Gibbs' Con- 
ditioning Principle (GCP) - an important strengthening of CoLT - captures this 
'intuition'; cf [3], [7]. 

If S(£) = X then the /-projection p of r on II = £ is unique and belongs to the 
exponential family of distributions £(r, u, 9); i.e., £(u, a) n £(r, u, 9) — {p}. 

4. Conditioning by rare sources 

In the above setting there is a fixed source r and a rare set II„ of n-types. We 
now consider a setting where the n-type is unique, and there is a set Q n of rare 
n-sources of the type. 

Furthermore, n-sources q n are assumed to have prior distribution ir(q n ). If from 
lZ n n-source q n occurs, then the source generates n-type v n with the probability 

We are interested in the asymptotic behavior of the probability ir(q n G B \ (q n G 
Q) A v n ) that if the n-type u n and an n-source q n from a rare set Q occurred, then 
the n-source belongs to a subset B of Q. Note that ir(q n G B | (q n G Q) A v n ) = 
x(g"ggjt/") ' P rov ided that n(q n G Q\v n ) > 0. The posterior probability ir(q n \v n ) is 
related to the defined probabilities -K{v n \q n ) and n(q n ) via Bayes's Theorem. 

Asymptotic investigations will be first carried on under the assumption of uni- 
form prior distribution of n-sources (Sect. 4.1). The assumption will be relaxed in 
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Section 4.2. Within each of the sections, two cases of convergence will be consid- 
ered: a static and a dynamic case. For the static case asymptotic investigations 
are carried over a subsequence of types, which are fc-equivalent to v n ° . A type 
v kn Q A [fc nij _ _ _ ; kn rn ]/kn a , k G N, is called /c-equivalent to v n ° . The dynamic case 
assumes that there is a sequence of n-types which converges in the total variation 
to some p G V{X). For each case what is meant by rare source will be defined 
separately. 

For p,q G V(X), the L-divergence L(q\\p) of q with respect to p is the map 
L : V{X) x V{X) R U {-oo}, L(q\\p) = J2xP lo SQ- Th e L-projection q of p 
on set of sources Q is: q = argsup 9g g L(g||p). The value of L-divergence at an 
L-projection (i.e., sup qeQ L(q\\p)) is denoted by L(Q\\p). 

4.1 Uniform prior. 

Within this section it is assumed that n-sources have a uniform prior distribution. 
Since there is total N = C'+'V) 

n-sources (cf. [4]), the uniform prior probability 

w(q n ) = 1/N, for all q n G TZ n . 
4-1.1 Static case. 

Let there be an no-type v na . A set Q of sources is rare if it does not contain 
v no . 

Sanov's Theorem for Sources (abbreviated LST) is a counterpart of the Sanov's 
Theorem for Types. 

Static iST. Let v 110 be a type. Let Q be an open set of sources. Then, for n — > oo 
over a subsequence n — kn , k G N, 

ilog7r((7" G Q\v n ) =L(e|K°) -L{V\\v no ). 

Proof. Under the assumption of uniform prior distribution of of n-sources 
lo g7 r( 9 " G QK) = log n^")""" " log ]J(l n ) nvn - 

q"€Q X q"GV X 

Since N < (n + l) m (cf. Lemma 2.1.2 of [7]), ^ log7r(g"° G Q\v n °) can be bounded 
from above and below as: 

777 ' 

L{Q no \\v n °) - L{1l no \\v n °) - — log(n + l) < — log7r(g" G Q\v n °) < 

no n 

777 

< L{Q no \\v n °) - L(K na \\v na ) + — log(n + 1). 

n 

Fix p G V(X). Equip R U {— oo} with the standard topology (i.e., the topology 
induced by the total order). As for each open subset A of R U {— oo}, L _1 (A) is an 
open subset of V(X), the L-divergence is continuous in q. 

Q is open by the assumption. 

Thus, L(Q no \\v n °) converges to L(Q\\v n °) as n — > oo, n = kn , k G N. Also, 
L{n na \\v na ) converges to L{V\\v no ) for n oo, n = kn , k G N. □ 

The Law of Large Numbers for Sources (LLLN) is a direct consequence of LST. 
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Static LLLN. Let v na be a type. Let q be L-projection of v nQ on V(X). And let 
B(q, e) be a closed e-ball defined by the total variation metric, centered at q. Then, 
for e > and n — > oo over the types which are k-equivalent with v n ° , 

^"e%e)K) = i. 

Proof. Let B c (q,e) = V(X)\B(q,e). Since B c (q,e) is open by the assumption, 
LST can be applied to it. Since B c C V, L{B c \\v n °) - L(V\\v n °) < 0. Thus, 
n(q n E B c (q, e)\v n ) converges to 0, as n — > oo over a subsequence of n = kn n , 

k e N. □ 

Obviously, the L-projection q of v n ° on V(X) is q = v n °. 

iLLN is a special, unconditional case of the Conditional Limit Theorem for 
Sources (LCoLT), which is a consequence of LST, as well. 

Static LCoLT. Let v na be a type. Let Q be a convex, closed rare set of sources. 
Let q be the L-projection of v na on Q and let B{q, e) be a closed e-ball defined by 
the total variation metric, centered at q. Then, for e > and n — > oo over a 
subsequence n — kn^, k G N, 

7r(flf n eB(«,e)|(g n ee)Ai/ n ) = l. 
Proof. Let B c (q,e) ±P(X)\B(q,e). Clearly, 

log7r(g™° G B°(q, e) | (q n ° e Q) f\v na ) = \ogir(q na e B c \v na ) — log7r(g™° G Q\v n °). 

Since both B c (q,e) and Q are open, LST can be applied. As B c (q,e) C Q, 
L(B c \\v n °) - L(Q\\v no ) < 0. Hence ir(q n £ B c \(q n G Q) A z/ 1 ) converges to 0, as 
n — > oo over a subsequence of n = fcn , fc £ N. Since under the assumptions on Q 
the L-projection of v n ° on Q is unique, the claim of the Theorem follows. □ 

Example. Let X = {1, 2, 3, 4}. Let Q = {q : J^xex l( x ) x = 1 - 7 )- Let n o = 10 and 
v n ° = [1,1,1,7]/10. The L-projection of v n ° onQisq = [0.705, 0.073, 0.039, 0.183]. 
Let e = 0.1. The concentration of n-sources on the L-projection, which is captured 
by the Static LCoLT, is for types fc-equivalent to v 110 (k — 5,10,20,30) illustrated 
in Table 1. 

Table 1. Values of ir(q n G B(q,e)\(q n e Q)A v n ) 
for n = kn , k = 5, 10,20,30. 



n 


Tr(-I-) 


50 


0.868 


100 


0.948 


200 


0.994 


300 


0.999 
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The L-projection at the above Example can be found by means of the following 
Proposition. 

Proposition. Let Q = C(u,a). Let p G V{X) be such that S(p) = S(C). Then 
the L-projection q of p on Q is unique and belongs to A(p,u,6,a) family; i.e., 
C(u, a) n A(p, u, 9, a) = {q}. 

Proof. In light of Theorem 9 of [6] it suffices to check that q = p[l — J2j=i @k(uj(x) — 
<Xj)] -1 , with 9 such that q G C(u,a), satisfies: 

S(p) v q y 

for all q' G Q, which is indeed the case. □ 
4-1-2 Dynamic case. 

Let there be a sequence of n-types which converges in the total variation to a 
pmf p G V{X), denoted as v n — > p. In this case, a set Q of sources is rare if it does 
not contain p. 

Dynamic LST. Let v n — > p. Let Q be an open set of sources. Then, 
lim - logTT^" G Q\v n ) = L(Q\\p) - L{V\\p). 

n— ^oo n 



Dynamic iLLN. Let v n — > p. Let q be L-projection of p on V . And let B(q, e) be 
a closed e-ball defined by the total variation metric, centered at q. Then, for e > 0, 

lim Ti(q n G B{q,e)\v n ) = 1. 



Dynamic LCoLT. Let v n —> p. Let Q be a convex, closed rare set of sources. Let 
q be the L-projection of p on Q and let B(q, e) be a closed e-ball defined by the total 
variation metric, centered at q. Then, for e > 0, 

lim Tr(q n G B(q, e) | (q n G Q) A v n ) = 1. 



Proofs can be constructed along the lines for the static case. 

4.2 General prior. 

Let ir(q) be a prior pmf on 1Z. From this pmf, a prior distribution n ,A (q n ) on 
lZ n is constructed by a quantization A = {A\, A2, . . . , An} of 7Z into disjoint sets, 
such that each A G A contains just one q n from 1Z n . Then ir A (q n ) = n({Aj : q n G 
Aj,j = 1,2,..., N}). 

Let S = S(ir(-)). Let = Q n S, V" = V n S. 

As the static case is subsumed under the dynamic one, only the latter limit 
theorems will be presented. 
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General prior LST. Let v n — > p. Lei Q be an open set of sources. Let ^ 0. 

lim - \ogTi A (q n G QW n ) = L(Q*\\p) - L(V*\\p). 

Proof. For a zero-prior-probability n-source, the posterior probability is zero as 
well; so such sources can be excluded from considerations. Let S n = S(Tr A (q n )), 

QZ±QnS n ,VZ±VnS n . 

logw A (q n £Q\v n ) = log ]T 7r A (q n )l[(q n ) nvn -log ]T 7r A (q n ) H{q n ) n " n . 

q n £QZ X q n £V% X 

Denote by \{Ql\\v n ) = sup 9 „ e gx \{q n \\v n ) 1 where \{q n \\v n ) = L{q n \\v n ) + 
— \ogir A (q n ). Using this notation and invoking the same argument as in the proof 
of LST for uniform prior, ^logn A (q n G Q\v n ) can be bounded from above and 
below as: 

A(Q£IK)-A(n:iK)-^i°g(« + i) < ^og-n A {q n e QK) < 

77? 

<A(Q-|K)-A(^|K) + -log(n + l). 

Since for n — > oo, S n = S, and v n — ► p, and Q is open, it taken together, implies 
that \{Ql\\v n ) converges to L{Q K \\p). Similarly, \{VZ\\v n ) converges to L(V v \\p). 
□ 

From the General prior LST, follows 

General prior LCoLT. Let v n — > p. Let Q be a convex, closed rare (i.e., p £ Q) 
set of sources. Let Q 71 " ^ and let q 77 be the L-projection of p on Q 71 . Let B(q^,e) 
be a closed e-ball defined by the total variation metric, centered at q* . Then, for 
e > 0, 

lim ir A (q n G B{q*,e) | {q n £ Q) A v n ) = 1. 

n^oo 



4.3 Conditioning by rare sources: continuous alphabet 

Sanov's Theorem for continuous alphabet can be established either via 'the 
method of types + discrete approximation' approach (cf. [4]) or by means of the 
large deviations theory (cf. [7]). The former approach will be used here to formulate 
continuous alphabet version of LST. 

Let (y, T) be a measurable space. Let T m be a partition of the alphabet y 
into finite number m of sets T m = (T 1 ,T 2 ,..., T m ); % G T. The T m -quantizcd 
P, denoted by P r , is defined as the distribution P(7i), P(T 2 ), P(T m ) on the 
finite set X = {1, 2, . . . , to}. 

Let V(y) be the set of all probability measures on (y,F). Let Q C V. For 
probability measures (pm's) P,Q G V(y), the L "'-divergence L m (Q\\P) of Q with 
respect to P is defined as 

L m {Q\\P) — supL(Q r ||P r ), 
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where the supremum is taken over all m-element partitions. L m (Q\\P) denotes 
sup QeQ L™(Q||L). Let Q T = {Q : Q T e Q}, L m {Q r \\P r ) 4 sup s L(Q r \ \P r ). 

The empirical distribution v n ' m of an n-sequcnce of 3^-valued random variables 
Y with respect to a partition T m is defined as 

? ,m = 1 Card { ri : y 4 G Tj-; 1 < i < n}, 1 < j < m. 
J n 

The r m -topology of pm's on (y, J 7 ) is the topology in which a pm belongs to the 
interior of a set Q of pm's iff for some partition T m and e > 

{Q 1 : IQ'iTj) - Q(Tj)\ < e,j = l,2,...,m} C Q. 

Thus, an n-source q n E 1Z n (X) belongs to the interior of Q if there exists T m of 
y and e > such that the set {Q' : \Q'(T 3 ) — q^\ < e,j = l,2,...,m} is a subset 
of Q. 

Under the assumption of uniform prior distribution of n-sources, a continuous 
analogue to the Dynamic LST is: 

Continuous LST. Let, as n — > oo, ^™ ,m — > R, R G TS.(A'). Lei Q 6e a rare ('i.e., 
i? ^ Q; open suoset ofV{y). Then 

lim ilog7r((f G Q|^"' m ) = L m (Q||i?) - L m (P||i?). 

n — >oo ri 

Proof. First, an asymptotic lower bound to ^ log 7r (q n G Q\v n ) will be estab- 
lished. Pick up a Q such that for a T"\ and an e > 0, Q G Q. Let M T (Q) = 
{q n : a™ — Q(Tj)\ < e,j = 1,2, .. . , m}. By the Dynamic LST for uniform prior, 
lim^oo ilog7r(g" G M r {Q)\v n ) = L(M T (Q)\R T )-L(R T \R T ) which is greater or 
equal to L(Q r \R T ) - L(R T \R T ), since Q T G X r (Q). Let 4 U T ™M T (Q). 

Then, for n oo, Mog7r(g" G M{Q)\v n ) > sup r ™ L(Q T \R T ) - L{R T \R T ) = 
L m (Q\\R) - L m (R\\R). Since n(q n G Q\v n ) > sup QeQ 7r(g" G M{Q)\v n ), 

lim -log7r(g™ G fi|i/ n ) > sup L m (Q\\R) - L m (R\\R) = L m (Q\\R) - L m {T\\R). 

rwoo n Q( zQ 

Asymptotic upper bound: for T m as above, by the Dynamic LST with a uniform 
prior, 

lim -lo g 7r(g" G Q r K) =L m (Q T \\R T ) - L m (V T \\R T ) 

n— >oo fi 

= sn P L(Q T \\V T ) - L(R T \\R T ). 
Q 

Since n(q n G Q\v n ) < sup rm 7r(g" G Q r K), 

lim -log7r(g" G Q\v n ) < L m {Q\\R) - L m (P\\R). 

n^oo 77, 

As the asymptotic lower and upper bounds coincide, the claim follows. □ 
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5. Application to Criterion Choice Problem 

1. Let there be an alphabet X (finite, for simplicity) and prior distribution 7r(g ra ) 
of n-rational sources. From the prior n(q n ) an n-source is drawn, and the source 
then generates an n-type v n . We are not given the actual n-source, but rather a 
set Q to which the n-source belongs. Given the alphabet X, the n-type v n , the 
prior distribution of sources ir(-) and the set Q C V(X) the objective is to select an 
n-source q n G Q. This constitutes the /3-problcm. Since Q in general contains more 
than one n-source the problem is under-determined and in this sense ill-posed. 

If Q = P(X), then under the assumption of uniform prior distribution of n- 
sources, Static LLLN shows that asymptotically (along the types fc-equivalent with 
v n ) it is just q = v n which is the 'only-possible' source of v n (i.e., of itself) 1 . 
Dynamic LLLN, assuming that v n — > r, implies that the n-sources concentrate on 
the true source. However, they do not, if a general prior is assumed, such that it 
puts zero probability on the true source. In the dynamic case {y n — > r) with general 
prior, n-sources concentrate on the L- projection of r on P 1 . 

What if Q does not contain v n l How should an n-source be selected in this case? 
One possibility is to select q n from Q by minimization of a distance or a convex 
statistical distance measure [16] between v n and Q n . In this way, the original (3- 
problem of selecting q n G Q is transformed into an associated Criterion Choice 
Problem (CCP). 

If the rare Q is convex and closed, Static LCoLT shows that - at least for n 
sufficiently large - the CCP associated with this instance of the /3-problem should 
be solved by maximization of the i-divergence over Q. A major qualifier has to 
be added to this statement: it holds provided that uniform prior distribution of n- 
sources is assumed. If a general prior, strictly positive on the entire set of rational 
sources is assumed, then the statement still holds. Prior matters only if it is not 
strictly positive on the entire 1Z. Then, it is the L-projection of v n on that 
should be selected (recall the General prior LCoLT). 

2. Confront the /3-problem with the following a-problem (also known as Boltz- 
mann Jaynes Inverse Problem): let there be a source q that emits letters from an 
alphabet X. From the source q an n-type was drawn. We are not given the actual 
n-type, but rather a set n to which the n-type belongs. Given the alphabet X, the 
source q and the set n the objective is to select an n-type v n G n. 

The CCP associated with the a-problem is solved by CoLT and GCP provided 
that n is a convex, closed rare set. The Theorems imply that at least for sufficiently 
large n, the /-projection of q on n should be selected. 

6. Endnotes 

1) The terminology and notation of this paper follow more or less closely [2], 
[4] , [6] , [7] . The brief survey of Large Deviations Theorems for Empirical Measures 
(Sect. 3) draws from the same sources. For evolution of the results see among 
others [1], [3], [7], [8], [11], [12], [15], [17], [18], [21], [22], [23], [24], [25]. The 
inequalities used in Sect. 4 belong to standard tool kit of the Method of Types, cf. 
[4]. In relation to the Proposition of Sect. 4.1 see also [9]. The continuous case of 
conditioning by rare sources (Sect. 5) is built parallel with [11] and [4]. 



1 Note that in the case of unrestricted Q, v n is known to be Non-parametric Maximum Likeli- 
hood Estimator of the source. Here, v n is the Maximum A-posteriori Probability source. 
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2) This work is motivated by [10], where a problem of selecting between Empirical 
Likelihood and Maximum Entropy Empirical Likelihood (cf. [19], [20]) has been 
addressed on probabilistic, rather than statistical, grounds. Further discussion, 
relevant also to the CCP associated with the a and /3-problems, can be found 
there. 

3) Any of the results presented here may be stated in terms of reverse I- 
projections [5]. For instance the right-hand side of the General prior LST could be 
equivalently expressed as — (I{p\\Q*) — /(p||'P 7r )), where I(p\\C) = inf 9e c I(p\ \q) is 
the value of the /-divergence at a reverse /-projection of p on C. The above men- 
tioned statistical considerations (and 4) below) served as a motivation for stating 
the results in terms of the newly introduced L-divergence, though the L-projection 
is formally identical with the reverse /-projection, which is already in use in a para- 
metric context, cf. [5]. The present work leaves open the issue whether it is more 
advantageous to state the Theorems of conditioning by rare sources in terms of the 
L-projection or in terms of the reverse /-projection. 

4) If p is an n-type then the L-divergence is known as Kerridge's inaccuracy; 
cf. [13], [14]. 

5) For any prior n(-), the L-projection q v of p on Q 71 " is the same as the source 
which has asymptotically supremal over Q* value of the posterior probability 
■n(q n \v n ). In the case of uniform prior the correspondence holds for any n. 
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